PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs
نویسندگان
چکیده
The predictive learning of spatiotemporal sequences aims to generate future images by learning from the historical frames, where spatial appearances and temporal variations are two crucial structures. This paper models these structures by presenting a predictive recurrent neural network (PredRNN). This architecture is enlightened by the idea that spatiotemporal predictive learning should memorize both spatial appearances and temporal variations in a unified memory pool. Concretely, memory states are no longer constrained inside each LSTM unit. Instead, they are allowed to zigzag in two directions: across stacked RNN layers vertically and through all RNN states horizontally. The core of this network is a new Spatiotemporal LSTM (ST-LSTM) unit that extracts and memorizes spatial and temporal representations simultaneously. PredRNN achieves the state-of-the-art prediction performance on three video prediction datasets and is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures.
منابع مشابه
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition
Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-strea...
متن کاملSpatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks
Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic s...
متن کاملPredictive Coding for Dynamic Visual Processing: Development of Functional Hierarchy in a Multiple Spatiotemporal Scales RNN Model
This letter proposes a novel predictive coding type neural network model, the predictive multiple spatiotemporal scales recurrent neural network (P-MSTRNN). The P-MSTRNN learns to predict visually perceived human whole-body cyclic movement patterns by exploiting multiscale spatiotemporal constraints imposed on network dynamics by using differently sized receptive fields as well as different tim...
متن کاملImproving Speech Recognition by Revising Gated Recurrent Units
Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs ...
متن کاملSuper Mario as a String: Platformer Level Generation Via LSTMs
The procedural generation of video game levels has existed for at least 30 years, but only recently have machine learning approaches been used to generate levels without specifying the rules for generation. A number of these have looked at platformer levels as a sequence of characters and performed generation using Markov chains. In this paper we examine the use of Long ShortTerm Memory recurre...
متن کامل